MDL Histogram Density Estimation
نویسندگان
چکیده
We regard histogram density estimation as a model selection problem. Our approach is based on the information-theoretic minimum description length (MDL) principle, which can be applied for tasks such as data clustering, density estimation, image denoising and model selection in general. MDLbased model selection is formalized via the normalized maximum likelihood (NML) distribution, which has several desirable optimality properties. We show how this framework can be applied for learning generic, irregular (variable-width bin) histograms, and how to compute the NML model selection criterion efficiently. We also derive a dynamic programming algorithm for finding both the MDL-optimal bin count and the cut point locations in polynomial time. Finally, we demonstrate our approach via simulation tests.
منابع مشابه
Information-Theoretically Optimal Histogram Density Estimation
We regard histogram density estimation as a model selection problem. Our approach is based on the information-theoretic minimum description length (MDL) principle. MDLbased model selection is formalized via the normalized maximum likelihood (NML) distribution, which has several desirable optimality properties. We show how this approach can be applied for learning generic, irregular (variable-wi...
متن کاملComputationally Efficient Methods for MDL-Optimal Density Estimation and Data Clustering
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. T...
متن کاملNml-optimal Histogram Density Estimation
Density estimation is one of the central problems in statistical inference and machine learning. Given a sample of observations, the goal of histogram density estimation is to find a piecewise constant density that describes the data best according to some pre-determined criterion. Although histograms are conceptually simple densities, they are very flexible and can model complex properties lik...
متن کاملExtensions to MDL denoising
The minimum description length principle in wavelet denoising can be extended from the standard linear-quadratic setting in several ways. We describe briefly three extensions: soft thresholding, histogram modeling and a multicomponent approach. The MDL hard thresholding approach based on the normalized maximum likelihood universal modeling can be extended to include soft thresholding shrinkage,...
متن کاملExact Minimax Predictive Density Estimation and MDL
The problems of predictive density estimation with Kullback-Leibler loss, optimal universal data compression for MDL model selection, and the choice of priors for Bayes factors in model selection are interrelated. Research in recent years has identified procedures which are minimax for risk in predictive density estimation and for redundancy in universal data compression. Here, after reviewing ...
متن کامل